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Abstract — Intra-session network coding is Itnown to be vul- 
nerable to pollution attacks. In this work, first, we introduce a 
novel homomorphic MAC scheme called SpaceMac, which allows 
an intermediate node to verify if its received packets belong 
to a specific subspace, even if the subspace is expanding over 
time. Then, we use SpaceMac as a building block to design 
a cooperative scheme that provides complete defense against 
pollution attacks: (i) it can detect polluted packets early at 
intermediate nodes and (ii) it can identify the exact location of 
all, even colluding, attackers, thus making it possible to eliminate 
them. Our scheme is cooperative: parents and children of any 
node cooperate to detect any corrupted packets sent by the node, 
and nodes in the network cooperate with a central controller 
to identify the exact location of all attackers. We implement 
SpaceMac in both C/C++ and Java as a library, and we make 
the library available online. Our evaluation on both a PC and 
an Android device shows that (i) SpaceMac's algorithms can be 
computed quickly (~ 28 /is in C/C++) and efficiently (~ 64 KB 
in C/C++), and (ii) our cooperative defense scheme has both low 
computation (~ 190 fj.s in C/C++) and low communication (~ 
2%) overhead, significantly less than other comparable state-of- 
the-art schemes. 

Index Tenns — Byzantine attacks, pollution attacks, network 
coding, attack detection, attack location, homomorphic MAC 

I. Introduction 

THE network coding paradigm advocates that intermediate 
nodes in a network should mix incoming packets instead 
of simply forwarding them, and receivers should decode to 
obtain the original packets. This idea, originally introduced by 
Ahlswede et al. 1 1 1, has been shown to bring benefits in terms 
of throughput and distributed operation of networks, and has 
received much attention. In this work, we consider networks 
that employ intra-session linear network coding. 

An inherent weakness of network coding is that it is 
particularly vulnerable to pollution (a.k.a. Byzantine) attacks. 
Malicious nodes can inject corrupted packets into a network. 
These packets are combined and forwarded by downstream 
nodes, causing a large number of corrupted packets propa- 
gate in the network. This wastes resources of the network, 
such as bandwidth and CPU time, and eventually prevents 
the decoding of the original packets at the receivers. The 
detrimental effect of pollution attacks has been shown through 
both theoretical analysis |2 | as well as experimentation Q. 

Proposed defense mechanisms against pollution attacks can 
be classified into three categories: error correction f5l-|f8), 
attack detection g), ||9)-|[18|, and locating attackers |19|, |2^. 
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In this paper, we are interested in the latter two approaches. 
In particular, we set out to design a complete defense system 
that can not only detect the pollution attack in a timely manner 
but also accurately locate and eliminate all pollution attackers. 
This allows for deaUng with any attack early and at its root. 
To the best of our knowledge, none of the existing defense 
mechanisms can provide this level of protection. 

To this end, we first propose a novel homomorphic message 
authentication code (MAC) scheme for expanding spaces 
called SpaceMac. SpaceMac allows a node to verify if its 
received packets belong to a specific subspace, even if the 
subspace is expanding over time. We then design our novel 
cooperative defense system which includes both a detection 
scheme and a locating scheme, using SpaceMac as their 
building block. Our detection scheme relies on SpaceMac to 
force intermediate nodes to send only linear combinations of 
packets that they actually receive from their parents. Parents 
and children of any intermediate node cooperate to detect 
corrupted packets sent by the intermediate node. Our locat- 
ing scheme uses SpaceMac to force nodes in the network 
to truthfully cooperate with a central controller so that the 
controller can exactly locate the pollution attackers. Finally, 
by leveraging multiple generations, our scheme is able to deal 
with a large number of colluding attackers. 

The main contribution of this paper is twofold: 

• The design and implementation of SpaceMac: We 
describe the construction of SpaceMac and provide a 
formal security proof for the construction. We implement 
SpaceMac in both C/C++ and Java as a ready-to-use 
library. Our Java implementation is compatible with the 
current Android OS (Android 2.2 Froyo). We make the 
library available online pT]|. 

. The design of a novel cooperative defense system 
based on SpaceMac: To the best of our knowledge, 
our defense system is the first that meets all of the 
following requirements simultaneously: (i) it can provide 
timely in-network detection, (ii) it can exactly locate all 
pollution attackers, (iii) it can deal with a large number 
of colluding attackers, and (iv) it has low communication 
and computation overhead. 

We have extensively evaluated the computation overhead 
of SpaceMac's algorithms and both the computation and 
communication overhead of our defense scheme through real 
implementation in both C/C++ and Java, and on both a PC and 
an Android device (Samsung Captivate). Our evaluation results 
show that all three algorithms of SpaceMac (Mac, Combine, 



and Verify) can be computed efficiently (requiring 64 KB of 
memory in C/C++ or 128 KB in Java) and also quickly on a PC 
(< 28 fis in C/C++) and even on a smart phone (< 2.3 ms). 
Evaluation results also demonstrate that when implementing 
our defense scheme, nodes in the network introduce very small 
computational delay (in the order of sub-millisecond on the PC 
and millisecond on the smart phone). Moreover, our defense 
scheme was shown to introduce very low communication 
overhead (2%), significantly less than other comparable state- 
of-the-art schemes. Lastly, through a simulation in Python, we 
show that in a medium-size network of 50 nodes, our locating 
scheme can quickly locate all, even colluding attackers (20 
attackers in about 1 second). 

The rest of this paper is organized as follows. Section 
im discusses existing approaches to protect network coding 
against pollution attacks. Section formulates the problem, 
describes the threat model, and discusses our design goals. 



Section IV presents our key observations and the overview 



of our approach. Section |V] presents the construction of 
SpaceMac and the formal security proof. Section VI describes 



our detection scheme. Section VII describes our locating 



scheme. Section |VIII| analyzes the security of our proposed 
schemes. Sectiori[lX| presents our implementation and the 
evaluation results. Finally, we conclude in Section pC] 

II. Related Work 

There are three main approaches in the literature to com- 
bat pollution attacks: error correction, attack detection, and 
locating attacks. Below, we discuss each one in detail. 

A. Error Correction 

One of the earliest work on error correction for network 
coding is by Cai and Yeung f5l|. The study in |5| introduces 
network error-correcting codes as a generalization of the 
traditional error correction codes. In a related study by Zhang 
||8|, the minimum rank of a network error correction code is 
defined; this concept is analogous to the minimum distance 
in classical coding theory. Based on this concept, a network 
error correction codes similar to an ordinary linear network 
single source multicast code is designed. Jaggi et al. p2[ 
consider packets from an attacker as an additional source and 
add redundancy at the source so that the receivers can decode 
both sources: the original source and adversary's source. In 
ijTj, Koetter and Kschischang proposed a coding metric on 
subspaces and a minimum distance decoder, which give rise 
to codes capable of correcting certain combinations of errors 
and erasures. 

These information theoretic approaches, which aim at cor- 
recting errors at the receivers, offer only limited security 
against restricted types of adversaries. These approaches as- 
sume that the adversaries can only corrupt a small number of 
edges and packets. Also, the amount of redundancy, which can 
also be considered as the communication overhead, typically 
increases proportional to the number of corrupted packets or 
adversaries. Furthermore, these approaches do not detect and 
drop corrupted packets, and thus are unable to prevent the 



corrupted packets from propagating in the network and using 
up resources. 

In contrast, our defense scheme is able to provide timely 
detection of the attack, thereby allowing for early filtering 
of corrupted packets. More importantly, our approach can 
accurately locate the attackers to eliminate them from the net- 
work. Furthermore, our approach can deal with more powerful 
adversaries, i.e., adversaries who pollute arbitrary number of 
packets and even colluding adversaries. However, we make an 
assumption on the adversaries' computational power, which is 
typical of all approaches utilizing cryptographic primitives. 

B. Attack Detection 

We first describe approaches that do not use homomorphic 
cryptography. In p3) . Ho et al. show that randomized network 
coding can be extended to provide end-to-end attack detec- 
tion, i.e., allow the receivers to detect any corrupted packet. 
The extension requires the source node to include in each 
source packet some additional hash blocks calculated from the 
source data blocks using polynomial functions. More recently, 
Kehdi and Li p6) propose an in-network detection scheme 
which exploits subspace properties of network coding. In their 
scheme, intermediate nodes verify the integrity of a vector 
by checking if it belongs to the subspace spanned by the 
source vectors. Null keys, which are vectors orthogonal to 
all the combinations of the source vectors, are used for the 
verification. This scheme is not collusion resistant: multiple 
nodes can collude to infer the null keys and make benign 
nodes accept polluted vectors. Yu et al. flT] use simple 
XOR checksums and exploit probabilistic key pre-distribution 
to provide in-network detection. This scheme, however, has 
significant communication overhead due to the aggregation 
of authentication tags; moreover, the scheme is c-collusion 
resistant for some pre-determined constant c, i.e., the scheme 
becomes vulnerable when there are more than c colluding 
attackers. In |3|, Dong et al. design a linear transformation 
checksums to be used with a time-based authentication scheme 
to provide in-network detection. This scheme requires time 
synchronization among nodes in the network and frequent 
public key verification (one per generation). 

A significant number of homomorphic cryptographic prim- 
itives ranging from hashes, signatures, to MACs, has been 
designed specifically to combat pollution attacks in network 
coding. In |9|, Krohn et al. proposed a homomorphic hash 
scheme for verification of rateless erasure codes. Gkantsidis 
and Rodriguez |4| later propose probabilistic checking and 
cooperation mechanisms among nodes to reduce the compu- 
tation overhead when using Krohn et a/.'s scheme in peer-to- 
peer file distribution systems. Li et al. |23j also propose a 
hash-based scheme based on a trapdoor one-way permutation 
which can avoid the pre-distribution of the hash blocks. In 
1 14], Zhao et al. propose a signature scheme where the source 
derives authentication information from a vector orthogonal to 
the source space. In p3) , Charles et al. propose a signature 
scheme based on aggregate signatures. Recently, Boneh et 
al. presents a signature scheme built on bilinear maps p2) . All 
of the hash-based and signature-based approaches suffer from 



a common drawback: they require expensive computation at 
intermediate nodes, either modular exponentiation or biHnear 
map, which resuhs in high latency. 

Recently, two homomorphic MAC schemes are proposed by 
two groups of researchers: Agrawal and Boneh [10], and Li et 
al. The scheme in [lO] relies on cover free set systems 

for pre-distributing keys to provide in-network detection, and 
thus, only c-coUusion resistant. This scheme is also susceptible 
to tag-pollution attacks, where malicious nodes tamper with 
some subset of tags of a packet. We discuss about this type 
of attack in detail in Section |VIII-C| The scheme in is 
collusion resistant (rather than c-collusion resistant) as well as 
resistant against tag-pollution attacks; however, it requires time 
synchronization among nodes in the network. Both schemes 
have low computation overhead since they only require simple 
addition and multiplication operations at intermediate nodes 
for both combining MAC tags and tag verification. In a more 
recent work |T8|, Zhang et al. introduce both a homomorphic 
MAC and a homomorphic signature scheme and propose a 
hybrid approach that uses both. This approach is not suscepti- 
ble to tag-pollution attack but only c-collusion resistant. This 
approach also has lower computation overhead than signature- 
based approaches; however, the overhead is still significantly 
higher than the other two MAC-based approaches due to the 
expensive exponentiation operations required by the signature 
scheme. 

Our SpaceMac scheme is inspired by the MAC scheme in 
pO| ; however, our scheme allows intermediate nodes to sign 
-expanding over time- subspaces. This stands in stark contrast 



to 1 10 1, which allows for signing only fixed subspaces. Our 
scheme can be considered as a generalization of the scheme 
in pO) . Detailed description and comparison are provided in 
Section [V] SpaceMac was originally introduced for authenti- 
cating expanding subspaces and locating the attackers in our 
preliminary work |24|. In this paper, we show that the ability 
to authenticate expanding subspaces can also be utilized to 
provide timely in-network detection without requiring time 
synchronization as in |ir|. 

C. Locating Attackers 

Compared to the other two categories, locating attackers 
has received less attention. An early work by Jafarisiavoshani 
et al. 1 19 1 leverages the subspace properties of random- 



ized network coding to locate pollution attackers. The main 
observation is that packets sent by a node have to belong 
to the space spanned by source packets and also the space 
spanned by the packets the node receives from its parents. 
Using this observation, in a general network topology having 
a single attacker, the authors can locate the attacker with an 
uncertainty of at most two nodes; when there are multiple 
attackers, the uncertainty is within a set of nodes including the 
attackers and their parents and children. Our scheme builds on 
and significantly improves this work: we make it possible to 
pinpoint the exact location of the attackers, even in the case 
where there are multiple colluding attackers, thereby allowing 
for the removal of all attackers. 

Recently, Wang et al. |20| introduce a light-weight non- 
repudiation protocol ensuring that (i) a malicious node that 
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injected a polluted packet cannot deny its behavior and (ii) 
a malicious node cannot disparage any innocent node. They 
build a defense scheme based on the protocol to identify 
malicious nodes. The scheme in pO) performs flooding of 
multiple checksums of all the packets sent by the source to all 
the nodes, which incurs significant communication overhead. 
Moreover, because the success of the locating scheme relies 
on the successful reception of the checksums at every node, 
this scheme is vulnerable to colluding attackers. Finally, this 
scheme is unable to locate all the attackers. We use their 
non-repudiation protocol as a building block of our locating 
scheme. However, unlike the scheme in [20 J , our scheme is 
able to locate all, even colluding attackers, without the need 
of checksums. 

Finally, compared to our prior preliminary work [ [24) , where 
we first presented SpaceMac and our locating scheme, this 
work is significantly improved and extended by the following 
three novel contributions: (i) we describe a novel construc- 
tion of SpaceMac, whose algorithms are significantly more 
computational efficient than our previous construction; (ii) we 
describe a novel detection scheme built on SpaceMac that 
can provide in-network detection with low overhead; and (iii) 
we implement the SpaceMac library in both CIC++ and Java 
and we extensively evaluate both SpaceMac and the proposed 
defense scheme on both a PC and an Android device. 

III. Problem Formulation 

In this section, we describe the notation that we use to 
express network operations in a multicast session with intra- 
session coding. In addition, we describe the threat model and 
the design goals of our defense system. 

A. Network Model and Operation 

We follow the notation used in j fTO] and [[19). Consider a 
fixed, directed acyclic graph (DAG), denoted by G = (V,£). 
There is a single source node S that multicasts packets to a 
set of receivers, denoted by TZ. Denote the set of intermediate 
nodes as I, i.e., I = V \ {7?. U {S"}}. Nodes in I perform 
generation-based linear network coding. A generation consists 
of m packets, vi, • • • , v^, in an n-dimensional linear space 
FJ^\ where m, n and q are fixed ahead of time and g ^ 1. The 
source augments every packet with m additional symbols, 
which are the coefficients of v^. The resulting packets, v/s, 
called source packets, have the following form: 
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Note that if a packet y is a linear combination of the 
packets v^'s, then the last m symbols of packet y contain the 
global linear combination coefficients. The source S sends the 
packets v^'s to the network. Denote the subspace spanned by 
packets v^'s by H"^ C F^+™. We refer to 11^ as the source 
space. 

We use Vn and Cjv to denote the sets of parents and children 
of a node N, respectively. Each intermediate node N receives 
from Vn some packets, which are linear combinations of 
the source packets. It then creates linear combinations of the 
received packets and sends them to its adjacent downstream 
nodes. We use n^(t) C F^+™ to denote the space spanned 
by the packets received by node N from a parent node P, 
P e Vn, up to time t. We further use njv(i) to denote the 
space spanned by all the packets received by node N from all 
its parents up to time t. We also denote the space spanned by 
all the packets sent by a node TV up to time t as li^ {t). When 
there is no ambiguity, we omit the time index t. A receiver 
R E TZ can successfully decode the original source packets 
using Gaussian elimination if its received space Hji equals 
the source space n"^. 

If all the nodes in the network are benign, then the space 
spanned by packets sent by N, 11^, must be a subspace of 
the space spanned by the packets that N receives, n^r. This is 
a property of networks that implement random linear network 
coding. This observation was also made in |19|. Formally, 

Lemma 1. If every node in the network is benign then for 
every node N, n^(i) C IlN(t). 

Also, observe that for any parent P of N, both 11^ and n^v 
expand over time. Formally, n^(to) C n^(ti) and nAr(io) C 
nAr(ti), for all to < ti 

Furthermore, when all the intermediate nodes € I are 
benign, the incoming spaces of all the intermediate nodes 
and the receivers are subspaces of the source space. Assume 
that there is a pollution attacker in the network. The attacker 
combines a subspace 11* ^ 11"^ with its incoming space 
and sends the resulting packets to its children; as a result, 
the incoming spaces of these children are not subspaces of 
11'^. Formally, for every node N,N £ V \ {S}, its incoming 
space from its parent P can be decomposed as follows: 



f^N 
l-^N 



f^N 



-j^ , where ® denotes the direct sum of spaces, 
4|f jjS p jjP^ ^jjj jjP contains packets not belonging to 
11'^. Table |l] summarizes the notation used in this paper 

Following the framework in | ,19j , we define a polluted 
directed edge as follows: 

Definition 1. A directed edge is polluted if it transmits any 
packet which is not a linear combination of the source packets. 



The following lemma, adapted from IT9I, is a direct conse- 
quence of the definition: 

Lemma 2. A directed edge e{P,N) is polluted if and only if 

B. Threat Model 

We assume that both the source and the receivers are 
trustworthy but the intermediate nodes may be malicious. The 
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Fig. 1. Cooperative defense system's main components, tlieir exploited 
subspace relationsliips, and tlieir properties 



case when the receivers are malicious is discussed separately in 



Section VIII-E We assume that the network may have multiple 
pollution attackers. They may be located at an arbitrary set of 
intermediate nodes in the network. Each attacker may inject 
corrupted packets, i.e., packets that do not belong to the source 
space, into a single or multiple downstream edges to pollute 
the network. They may also modify other data associated 
with the packets, such as, authentication tags. Successful 
modification of authentication tags constitutes an attack called 
"tag pollution," which could be as devastating as a pollution 
attack |11|. Tag pollution attacks are discussed separately in 



Section VIII-C We consider both cases where the pollution 
attackers launch their attacks independently or collude and 
coordinate their attacks. We further assume that the attackers 
are aware of our defense scheme, i.e., the construction and 
application of SpaceMac; however, similar to other crypto- 
graphic approaches, we assume that the attackers' running time 
is polynomial in the security parameter. 



C. Design Goals 

With this threat model in mind, we set out to design a 
defense system with the following goals: 

(1) In-network Detection. Any intermediate node in the net- 
work should be able to detect the attack as soon as 
its malicious parent injects a corrupted packet into the 
network. This prevents corrupted packets from polluting 
the downstream edges. 

(2) Exact Locating. The location of all pollution attackers 
should be precisely identified. This allows for the removal 
of the attackers from the network. 

(3) Arbitrary Collusion Resistance. The system should able 
to cope with multiple pollution attackers when they attack 
independently as well as when they collude. In particular, 
the defense system should be able to remove the attackers 
from the network even when they collude. 

(4) Low Overhead. The defense system should have low com- 
putation and low communication overhead. In particular, 
the system should require a little amount of computing 
from the nodes in the network and should not introduce a 
large amount of traffic, e.g., bandwidth of the MAC tags, 
to the network. 
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Fig. 2. An example illustrates how SpaceMac helps to detect pollution 
attacks. Using SpaceMac, A and B are able to sign the expanding space 
Tic (the received space of C) and D is able to verify any packet sent by C 
to see if it belongs to TIq. If there is a packet sent by C that is not in Ilf^, 
the attack is detected by D. The cooperation among A, B, and D helps to 
detect the attack from C. 
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Fig. 3. An example of inferring an attacker's location using information 
about polluted edges from 1 19|: The attacker is at node B. Scenarios (1) and 
(2) correspond to the sets of polluted edges when the attacker lies and is 
honest about its incoming subspace, respectively. The controller can narrow 
down the attacker to two nodes: A and B, as they initiate polluted edges. The 
cooperation among nodes in the network and the controller helps the locating 
process. 



To achieve the above goals, we design a defense system 
which consists of two main components: the detection scheme 
and the locating scheme. The detection scheme provides in- 
network detection while the locating scheme provides exact lo- 
cating. Both the detection scheme and locating scheme impose 
little computing overhead as well as communication overhead. 
The defense system as a whole is arbitrarily collusion resistant. 
Fig. [T] illustrates the overall structure of our defense system. 

IV. Key Observations and Approach Overview 

In this section, we describe our key observations and how 
we exploit these observations in our design of the detection 
and locating schemes. 

A. In-Network Detection 

Previous work that uses homomorphic MACs to detect 
corrupted packets, such as, the work by Agrawal and Boneh 
|[T0|, Li et al. (TT|, and Zhang et al. |18|, leverage the 
observation that if a packet does not belong to the source 
space, then it is a corrupted packet. The detection works 
by first establishing shared secret MAC keys between the 
source and the intermediate nodes. Then, using these secret 
keys, the source node can sign the fixed source space and the 
intermediate nodes can verify if their received packets belong 
to the source space. 

Our detection scheme leverages a different observation: A 
packet sent by an intermediate node must belong to the space 
spanned by all packets that it received from its parents (Lemma 
T 
21 



For example, consider a subset of nodes in a network in Fig. 
At any moment in the multicast session, a packet sent by C 
must belong to the space spanned by the packets it received 
from its parents: A and B; otherwise, C must be polluting the 



network. Formally, at any moment t in the multicast session, if 
an intermediate node N sends out a vector y then y e IlN{t); 
otherwise, y is corrupted. 

We use SpaceMac to enable the parents of N to sign the 
expanding space IIjv and the children of N to verify any 
packet that N sends to see if it belongs to IIjv. The cooperation 
of the parents of N and the children of N enables the children 
to detect any corrupted packet sent by N immediately. For 
example, in Fig. [2] at any time t, D is able to check if any 
packet it receives from C belongs to Ilc'{t); hence, D is able 
to detect any pollution attempt by C as soon as D receives a 
corrupted packet from C. 

B. Exact Locating 

Leveraging the cooperation between nodes in the network 
and a central controller, Jafarisiavoshani et al. [19] have 
shown that when there is a single pollution attacker, its location 
can be narrowed down to a set of at most two nodes. This 
is by analyzing the polluted edges identified based on the 
incoming subspaces reported by all the nodes to the central 
controller An example is shown in Fig. [3] When there are 
multiple attackers in a general network topology, the number 
of suspected nodes increases to include the attackers and their 
parents and children. 

Our key observation here is that the uncertainty about 
the location of the attackers originates from the fact that 
the attackers can lie about their received spaces. Therefore, 
by ensuring that all nodes in the network cannot lie about 
their received spaces, we can exactly locate the attackers. For 
instance, if the attacker cannot lie in the example given in Fig. 
[3] then the only possible scenario is scenario (2). Thus, one can 
determine that the attacker is at node B. We explicitly design 
and use SpaceMac to achieve this goal, i.e., prevent nodes 
from lying. To prevent attacker B from lying, A cooperates 
with the controller by signing the space spanned by the packets 
it sends to B using SpaceMac. This is so that when B reports 
a fake space, it will not have the proper signature of the fake 
space to convince the controller. 

V. The Construction of SpaceMac 

In this section, we describe the construction of SpaceMac. 
This construction is new and significantly more efficient than 
our old construction presented in |24|. This construction 
is an improvement of the homomorphic MAC construction, 
HomMac, proposed by Agrawal and Boneh |10|. The key 
difference between SpaceMac and HomMac is the security 
property that SpaceMac brings: SpaceMac allows for signing 
spaces that expand over time, while HomMac only allows for 
signing fixed spaces. This is directly reflected in the difference 
between the security games of SpaceMac and HomMac, 
and consequently the difference between the constructions. 
SpaceMac improves and generalizes of HomMac. Therefore, 
SpaceMac can also be used to sign fixed spaces as well, e.g., 
in the place of HomMac in the detection scheme in pOj. 
However, it is not possible to use HomMac to support either 
our detection scheme or our locating scheme because any 
intermediate node does not know about the entire source space 
(to use Sign of HomMac) at the time of tag generation. 



A. Definitions 

A (q, n, m) homomorphic MAC scheme is defined by three 
probabilistic, polynomial-time algorithms: Mac, Combine, and 
Verify. The Mac algorithm generates a tag for a given vector; 
the Combine algorithm computes a tag for a linear combina- 
tion of some given vectors; and the Verify algorithm verifies 
whether a tag is a valid tag of a given vector. 

• Mac(fc,id, y): 

- Input: a secret key fc, the identifier id of the source 
space 11'^, and a vector y e F^+™. 

- Output: tag t for y. 

. Combine((yi,ii,Q;i), • • • , (yp, tp, ap)): 

- Input: p vectors yi,--- ,yp, their tags ti,--- ,tp 
under key fc, and their coefficients ai, • • • , G Fg. 

- Output: tag t for vector y = Yfi=i y*- 
. Verify(/c,id,y,t): 

- Input: a secret key fc, the identifier id of the source 
space 11'^, a vector y S F^'"*"'", and its tag t 

- Output: (reject) or 1 (accept) 

Also, the scheme must satisfy the following correctness re- 
quirement: 

Verify I fc, id, ^ a^yi, Combine((yi, ti, cti), ■ ■ ■ ,{yp,tp,ap)) J = 1. 



B. Attack Game 

We consider the following attack game for a homomorphic 
MAC T — (Mac, Combine, Verify), a challenger C, and an 
adversary A: 

• Setup. C generates a random key k K, 

• Queries. A adaptively queries C, where each query is of 
the form (id, y). For each query, C replies to A with the 
corresponding tag t -s— Mac(A;, id, y). 

• Output. A eventually outputs a tuple (id*,y*,t*). 

Up to the time A outputs, it has queried C multiple times. Let 
I denote the number of times A queried C using id* and get 
tags for / vectors, y*i, • • • ,y*;, of these queries. Let y* — 
{y*^\ • ■ • ,?/*"^™^)- We consider that the adversary wins the 
security game if 

(i) {yi"~^^\ ■ ■ ■ ,2/1""'^™'') 7^ (trivial forge otherwise), 

(ii) Verify(fc, id*, y*, i*) = 1 , and 

(iii) y* i span(y*i, • • • ,y*;) . 

Let Adv[^, T] denote the probability that A wins the above 
attack game. We define a secure homomorphic MAC scheme 
as follows: 



Definition 2. A (q, n, m) homomorphic MAC scheme T is 
secure if for all probabilistic polynomial-time adversaries A 
Adv[^, T] is negligible. 

C. Construction 

Let K. and T) denote the domains of the keys and the 
id's of the spaces sent by the source, respectively. Let [n] 
denote {1, • • • ,ri}. We use a pseudorandom generator (PRG) 



key k for this construction consists of a pair (^1,^2), where 
fci e /Cg and k2 € JCp- 

* Mac(A;, id, y): A tag for a vector y = (?/^^\ • • • , 
using key k = (fci, ^2) is generated as follows: 

- r ^ G{ki) e F^+" 

- b^T.7=i [2;("+^)-nfc2,id,j)] elF, 



- i ^ (r • y) 



6eF„ 



. Combine((yi,ti,ai), • • • , (yp, tp, a^)): The tag t of y 
is computed as follows: 

- i ^ ELi ^« e Fg 

• Verify(fc, id, y, t): To verify if t is the valid tag of y using 
key k — {ki, ^2), we proceed as follows: 

- r ^ G{ki) e F^+'" 

- b^ET^i [y'^''+''> ■Fik2,id,j)] eF, 

- a ^ r • y G Fg 

- 'if a + b — t output 1 ; otherwise, output 

The correctness of the scheme is proved as follows. Suppose 

p 



(n+m) 



) = XI 



Then, 

a + b 
p 



i=l 



(,!• • yi 



E 



Fik2,id,l 



Fik2,id,j) 



i=l 



G: Kg ^ F«^ 



and a PRF F: /Cf x 2? x [m] ^ F,. A 



Compared to our old Space Mac construction p4) , this 
construction is significantly more efficient. In particular, com- 
pared to the old Mac and Verify algorithms, the new ones 
use one additional PRG call but significantly less number of 
PRF calls: m as opposed to n + m. Considering that a PRG 
computation is more efficient than a PRF computation and that 
in practice, n is typically an order of magnitude larger than 
m, e.g., n — 2048, m = 128 in a live video streaming system 
|l25l|, this new construction is one order of magnitude more 
computationally efficient. 

Compared to Horn Mac |I0) , our construction replaces the 
Sign algorithm, which generates tags for all basis vectors of a 
fixed space, with the Mac algorithm, which generates a tag for 
any vector in F^+™. The Combine algorithms show that tags 
generated by our Mac algorithms can be combined to produce 
a valid tag for an arbitrary linear combination. We note that 
both our constructions can be considered as a generalization of 
the scheme in |10| because the Mac algorithm can substitute 
the Sign algorithm, i.e., it can generate valid, combinable tags 
for all basis vectors. 

D. Security 

The security of SpaceMac is proven by assuming F is a 
secure PRF and G is a secure PRG. Let Bi and B2 denote a 
PRF adversary and a PRG adversary, respectively. Let PRF- 
Adv[Bi,F] and PRG-Adv[S2 , G] denote the advantages in 
winning the PRF and PRG security games, respectively|^ 

' The definition of PRF and PRG security games can be found in j26| . 



Theorem 3. For any fixed q, n, m, SpaceMac is a secure 
(q, n, m) homomorphic MAC assuming F is a secure PRF 
and G is a secure PRC. In particular, for every homomorphic 
MAC adversary A, there is a PRF adversary Bi and a PRC 
adversary B2, who have similar running time to A, such that 



Adv[yl, SpaceMac] < PRF- Adv [Si , i^] +PRG- Adv [S2 , G] - 



1 



Proof: The proof is by using a sequence of games denoted 
as Game 0, 1, and 2. Let Wq, W\, and W2 denote the events 
that A wins the homomorphic MAC security in Game 0, 1 and 
2, respectively. Let Game be identical to the Attack Game. 
Hence, 

Pr[Vl^o] = Adv[y^, SpaceMac] . (1) 

In Game 1, the PRG G is replaced by a truly random string, 
i.e., to respond to the Mac query, the challenger computes 
r ^ F^"*"™ instead of r ^ G(k\). Everything else remains 
the same. Then, there exists a PRG adversary B2 such that 



lPr[VFo] - Pr[VFi]l = PRG-Adv[S2, G] 



(2) 



In Game 2, the PRF F is replaced by a truly random 
function, i.e., to respond to the Mac query, the challenger 
computes h Sjli [y'"^^-* • s'-'^j, where s*^^) ^ Fg instead 
of s^^^ F(fc2,id,j). Everything else remains the same. 
Then, there exists a PRF adversary B\ such that 

lPr[VFi] - Pr[W^2] | = PRF-Adv[Si , F\ . (3) 

The complete challenger in Game 2 works as follows: 

Initialization, r A F^'+™ 

Queries. A adaptively queries C, where each query is of the 
form (id,y). C replies to query i of as follows: 
if id is never used in any of the previous queries: 

Sp-* A ¥q for j = 1, • • • ,TO 

else: 

sf s := the ones used in the previous response 



send i (r • y) + • s^")] 



Output. A eventually outputs a tuple (id*,y*,i*). To deter- 
mine if A wins the game we compute 

if id* = idi (for some i) then // case (i) 



set si :— for j = 1, • • • , m 



else 



set sl^'^ A F, for j = 1, 



// case (ii) 



, m 



Let / denote the number of times A queried C using id* and get 
tags for I vectors, y*i, • • • , y*/, of these queries. We consider 
that the adversary wins the game, i.e., event W2 happens, if 



iy 



(n+l) 



,y 



<* = (r-y*) + ^ 1^: 



and 



span(y* 



(4) 

(5) 

(6) 



In the following steps, we show that Pr[W2] = ^. Let T be 
the event that A outputs the tuple with a completely new id*, 
i.e., A never made queries using id* before. 



When T happens, i.e., in case (ii), since s* s are in- 

(fl-\- j \ 

distinguishable from random values in F^, and yX ''s 
are not all zeros, the right hand side of equation (jSj is 
a completely random value in ¥q, independent of the 
adversary's view. Thus, 



Pi[W2 ^T]^^ Pr[T] . 



(7) 



(7) 

When T does not happen, i.e., in case (i), s* of 
equation (j5]l have been used to generate tags for vectors 
y*i- • ■ ■ ,y*i- In this case, we will proceed by showing 
that when y* ^ span(y*i, • • • ,y*/), the right hand side 
of equation (j5]l looks indistinguishable from a random 
value in Fg. To this end, let II — span(y*i,--- ;y*/), 
and d be the dimension of II. Note that d < n + 
m because otherwise II = F^^"*, which implies 
y* G n. Let {bi, - - ,hd} be a basis of II. Denote 
aug(y) as the augmentation of vector y, i.e., aug(y) = 

Case (a): Consider the case when aug(y*) can be 
expressed as a linear combination of aug(bj),i e [l,d]. 
Let aug(y*) = Q^iaug(bi) for some a^. If we let 

y' = then the valid tag of y' for the same 

space id* as y* is 



3=1 L »=i 



(8) 



By subtracting equation (jSjl from (jSj, we know that by 
producing a valid forgery, the adversary found a y* and 
t* that satisfy the following equation: 



t' = r- (y* 



(9) 



However, since y* 7^ y' (y' is in II but y* is not), and r 
is indistinguishable from a random vector in F^'+™, the 
probability that he can satisfy (j9| is exactly ^. 

Case (b): Consider the case when aug(y*) cannot be 
expressed as a linear combination of aug(bi)'s. In this 
case, we proceed by showing that given a fixed y«, from 
the perspective of the adversary, the valid tag i* of y* is 
indistinguishable from a random value in F^: 

Let si^\i € [1,™], be the unknowns, and s = 
(si^\ • • • jsl™''). By the previous I queries, the adver- 
sary learns the following system of / equations and m 
unknowns: 



(I) 



'aug(y*i) • s = ty^j - r • y*i 



aug(y*;) • s = iy^, - r • y* 



Since {bi,--- , b^} is a basis of II = 
span(y*i,--- ,y*i), the above system is equivalent 



to the following system of d equations: 
aug(bi) • s = ui 



Not polluted 

Polluted 



(11) { 



aug(bd) ■s = Ud 



where each Uj is a linear combination of right-hand-side 
values of the equations of system (I). 

Let n' = span(aug(bi), • • • ,aug(brf)) and d' be the 
dimension of 11'. Note that d' < d. Let {ci, • • • , c^;/} be 
a basis of 11'. Note that since aug(y*) ^ 11', aug(y*) 
cannot be expressed as a linear combination of c^'s. The 
system of equations (II) is equivalent to the following 
system of d' equations: 



Ci • s = ei 



(III) 



Cd' 



where each Cj is a linear combination of right-hand-side 
values of the equations of system (II). A valid tag t^, of 
satisfies the following equation: 



aug(y* 



4 



r • y* 



(10) 



Without loss of generality, assume that the adversary 
knows r. Note that d < m otherwise aug(y*) e 
span(bi). Since d' < d, it follows that d' < m. The 
system of m unknowns and rf' + 1 linear equations, d' 



from (III) and 1 from ( 10 1, is consistent regardless of the 
value of because the coefficient matrix, whose rows are 
linearly independent vectors: Ci,- - ,0^/, and aug(y*), 
has rank d' + 1 < m. Furthermore, for any value i*, the 
the solution space always has size g'"^*^ ^i. Thus, for a 
fixed y», its valid tag, t^, could be any value in equally 
likely, given that sl'^'s are chosen uniformly at random 
from Fg. Hence, the probability of forging a valid tag 



By the result of case (a) and case (b), 
Pr[W2 A -T] = i . Pr[^T] . 



From equations (|7]) and (111, we have 



Pr[W^2] = Pr[VF2 A T] + Pr[VF2 A -T] = 



(11) 



(12) 



Equations Q, Q, ([3]), and (12 1 together prove the theorem, 



Theorem [3] states that an adversary A can break the scheme 
with probability For a small field size, e.g., q — 2^, 
the security of the MAC scheme may be unsatisfactory. To 
improve the security, one could either increase the field size 
or use multiple tags as suggested in 1 10] and ifT] . The security 
of our scheme using I tags is (i)'. As observed in fi\L it is 
preferable to use multiple tags instead of increasing the field 
size. This is because in order to achieve the same security 
(i)', using the field size instead of using / tags increases 
the computational complexity of field multiplication by log I 
times. 




Fig. 4. A network consisting of 8 nodes (resembling a network given in 1 19 |). 
B is tlie attacker. After every node (except for S) reports to tlie controller its 
true incoming spaces, B is identified as the attacker since it has no incoming 
polluted edge but has one outgoing polluted edge. 



VI. Detection Scheme 

In this section, we describe our detection scheme in detail. 
Our scheme exploits the observation outlined in Section [IV-A| 
to provide in-network detection. In particular, parents and chil- 
dren of an intermediate node N cooperate through SpaceMac 
to detect any corrupted packet sent by in a timely manner 
For ease of presentation, we describe the detection scheme 
within the scope of a single generation, i.e., considering a 
single source space id. 

1) Assumptions. We assume that there is a controller who 
knows the complete topology of the graph. The controller 
could be the source itself. This assumption is also made in 
recent work by Li et al. fTl |. We further assume that each node 
N shares with the controller a pair of secret keys (fej^, fc^). 
These keys can be established with the help of a Public Key 
Infrastructure (PKI). We note that recent proposed schemes 
which use homormorphic MACs also made assumption about 
the existence of a PKI (TT), j fTSj or the existence of shared 
secret keys |10|. In general, the problem of establishing 
shared secret keys is a challenging problem of its own and 
is orthogonal to this work. 

2) Bootstrapping. First, for every intermediate node N, 
the controller determines the key kfj, which will be secret to N 
itself and is used by the parents and children of N when using 
SpaceMac. Each node can serve as either a parent or a child; 
therefore, each node, depending on its position in the network, 
requires to know a different set of keys to participate in the 
detection scheme. For example, consider the network in Fig. 
|4] node D needs to know k^,kg, and k(j to detect corrupted 
packets sent by A, B and C, respectively. It also needs to 
know fc^; to help Ri and R2 to detect corrupted packets sent 
by E. 

Second, the source and all the receivers need to share an 
end-to-end key, k* . This key is used to ensure detection in 
the presence of colluding adversaries, in which case a node 
N colludes with its parent to obtain fc^ and thus can bypass 
the verification of its children. We defer the discussion about 



multiple adversaries to Section VIII-B where we analyze 
different colluding scenarios in depth. 

The controller then sends to each node N a bootstrap- 
ping packet consisting of the set of keys that are necessary 
for it to participate in the detection scheme. In particular, 
the controller construct the bootstrapping packet Iin'- = 
{^Jf xg{T'nuC]v}' ^*} • Note that 6jv contains k* if and only 
if N is either the source or a receiver. The controller then sends 
&Ar to N through a secure and authenticated channel achieved 



with fc^Y (for encryption) and /c^ (for authentication) using 
a standard encrypt-then-authenticate algorithrrj^ For example, 
node D in Fig. |4] receives {k^, kg, k(j, kj^} while node Ri 
receives {k^,k^,k^,k*}. 

We note that the MAC keys k* and kx can be used 
for multiple source space/generations. This is because, by 
construction, SpaceMac takes into account generation iden- 
tifiers when computing tags. As a result, the overhead of a 
key establishment is per multiple generations as opposed to 
per single generation. Hence, this overhead is asymptotically 
negligible in the number of generations. 

Finally, if each node knows its own 2-hop neighborhood 
information, then the MAC keys can also be bootstrapped 
without the help of the controller. In particular, by using a 
secure key distribution scheme for ad-hoc networks, such as 
p7) , the source can establish the shared secret MAC key k* 
with the receivers, and for a node N, a parent P can establish 
a shared secret MAC key kpf with the parents and children of 
N. 

3) Sending and Coding. Before sending out the each 
source packet, Vj, the source S calculates an end-to-end tag, 
t^., using the Mac algorithm of SpaceMac with key k*: 
t^. = Mac(vi,fc*) . S then attaches this tag to every source 
packet and sends = {t^, W'^i} instead of v^, where '||' 
denotes concatenation. The packets traversing the network 
are linear combinations of w^'s instead of v/s. For ease of 
presentation with regards to the input length of the SpaceMac 
algorithms, assume that the size of is still n. 

Consider a parent P who wants to send a packet y to its 
child N. P needs to calculate a helper tag which helps the 
children of N to detect corrupted packets sent by N . In partic- 
ular, before sending y to N, P needs to calculate a MAC tag, 
ty", using Mac under key kj^: ty" — Mac{y,kff). Besides 
the helper tag, P must also pass along a verification tag of y, 
which is used by N to verify the integrity of y. Assume P 
received {yi,--- ,y;} and their helper tags {tyf,-'' i^yf} 
from its parents, and P computes y as y = Y^\=i otiVi- 
Then, the verification tag, ty^ , of y can be computed using 

Combine: 4^ = Combine((yi, 4f , ai), • • • , (Y;, 4f > "/)) ■ 
The final packet that P sends to N includes y and its helper 
and verification tags: {ty" || ty^ || y} • 

We note that if a node is benign, besides explicitly calcu- 
lating the helper tag, it would code and send packets in a way 
identical to what it does normally. The verification tag will be 
computed correctly because the Combine algorithm linearly 
combines the tags in the same way the packets are combined, 
i.e., with the same set of coefficients. 

4) Receiving and Verification. When a node N receives 
from its parent P a packet {fy" 1 1 ty'' \ \ y}, it uses kp and 
the Verify algorithm to verify the integrity of the packet. The 
packet is deemed non-corrupted if Verify(fcp,y,ty^) = 1. 
The security guarantee comes from the security of SpaceMac: 
since P does not know kp, the probability that P can forge 
a valid tag, ty^ , when y is outside of its received space. Hp, 
is As a result, as soon as N receives a corrupted packet 

-We refer the reader to Chapter 4.9 in (26| for more details on encrypt- 
then-authenticate algoiithms. 



from P, with high probability, N is able to detect the attack 
immediately. 

In the case is a receiver, it further verifies the end-to- 
end tag using key k* . Parse y as {t^ ||w}. N accepts w if 
Verify(fc*, w, ) = 1 • The security guarantee, again, comes 
from the security of SpaceMac: since none of the malicious 
intermediate node knows k* , if w is outside of the source 
space, the adversary can only forge a valid tag of w, , with a 
negligible probability of K This second level of verification is 
to provide a detection mechanism in the presence of colluding 
adversaries. 

VII. Locating Scheme 

Locating the attackers to eventually eliminate them is a 
logical step after a pollution attack is detected. In this section, 
we describe in detail how we exploit the observation made 



in Section IV-B to exactly locate the pollution attackers. The 
main idea is to force nodes to truthfully report their received 
spaces to correctly identify polluted edges, thereby enabling 
the exact identification of the location of the attackers. 

A. Overview 

1) Reporting. The following lemma, originally presented 
in ps) and p9| , implies that for each received subspace, 11^, 
from a parent P, node N may report a randomly chosen 
packet, Yr, of the space instead of the space itself; and by 
checking if y^ G 11"^, the controller can determine if C II'^ 
to identify the polluted edges. 



Lemma 4 (Jafarisiavoshani et al. p9| , p8|). Let Hi and II2 

be two subspaces o/FJ^'+™ and assume that y^ is a randomly 
selected packet from Hi. Let di2 and d\ are the dimensions 
q/ninn2 and Hi, respectively. With probability 1 — g'^^^^'^i, 
Yr G n2 // and only if Hi C 112. 

2) Using SpaceMac. We use SpaceMac to prevent nodes 
from lying about their received spaces as follows. To enforce 
a node N to report a true received space, 11^, that it received 
from its parent, P, the parent P and the controller cooperate 
so that the controller only accepts reported packets belonging 
to but not outside of 11^. In particular, whenever P sends a 
vector Yi to N, it generates a tag, iy., of y^ using the Mac 
algorithm with a secret key shared by P and the controller 
Then, when N reports y^, if Yr is a linear combination of 
vectors that it received from P, yj's, then N can generate a 
valid tag for Yr by using the Combine algorithm on the tags 
of yi's that it received; if y,. is not a linear combination of 
yi's then N can forge a valid tag for y,. with only a negligible 
probability of i. 

3) Non-Repudiation Transmission Protocol. As pre- 
sented, SpaceMac forces nodes to report only true received 
subspaces since it is computationally difficult to forge valid 
tags otherwise. However, it does not prevent a malicious node 
from sending invalid tags to its children to prevent the children 
from reporting polluted spaces. 

For example, an attacker P can send a polluted packet Ye ^ 
11'^ and a bogus tag to its child N. When N reports the 
space n^, if the randomly chosen vector y,. was formed by 



a linear combination involving ye, then the aggregated tag tr 
of yr that N generates using the Combine algorithm will be 
invalid due to the bogus tag <e. As a result, the controller will 
reject y.^. Consequently, the attacker P successfully prevents 
its benign child N from reporting the polluted space H^. 

To address this, we utilize an efficient non-repudiation 
transmission protocol proposed by Wang et al. [20|. For a 
parent P and a child N, the controller generates a set of secret 
keys, denoted by X, based on the private key of the parent 
and the ID N of the child. After that, the controller randomly 
selects a set of keys y from X based on the private key of 
the child and the ID P of the parent; then, it sends y to the 
child. We denote A" \ as 3^; also, let A = \X\ and 5 = \y\. 

When sending a packet, P generates A tags (instead of one) 
using the Mac algorithm and all keys in X. When receiving 
a packet, N uses its set of keys y and the Verify algorithm 
to verify 5 out of A tags. Finally, when receiving a randomly 
chosen packet y,. chosen from 11^ and its A tags from the N, 
the controller uses all keys in 3^ and the Verify algorithm to 
verify all A — 15 tags. The controller, in this case, keeps track of 
a counter 9, < X — 5. If ?X least tags pass the verification 
then the controller accepts the report. 

The following two lemmas provide the security of the 
non-repudiation transmission protocol when applying to our 
context. Lemma |5] is identical to Theorem 1 in | |20) . Lemma 
|6] is an adapted version of Theorem 2 in pO| - the difference 
is that in our case, a node does not report a packet that it 
receives, but it reports a linear combination of packets that it 
receives instead. 



Lemma 5 (Non-repudiation of the receiver- Wang et al. \ 20 1). 

The probability that a malicious child node can successfully 
report to the controller that its parent sends it a packet y, 
which is never sent by the parent, is at most 



E 



A-A 1 
i J 



Lemma 6 (Non-repudiation of the sender- Wang et al. |20|). 

The probability that a malicious parent can make the con- 
troller reject the parent's space reported by its child by sending 
the child some packets with invalid tags is at most 



min{S,x) 

max p(x), where p(x) < > 
0<x<5+6l-l^ ' ^-v / - ^ 



itd 



The proofs of these two lemmas are provided in the Ap- 
pendix. Finally, we note that both of the above probabilities 
can be made very small by choosing appropriate values for 
q. A, 6, and 6. Examples of values for these parameters and 



the corresponding probabilities are provided in Table III The 
choice of parameters can then be made based on the desired 
tradeoff between the overhead and the probability that the 
attacker succeeds. 

4) Locating the Attackers. After the controller collects 
the true subspaces from every node, we proceed similar to the 
approach by Jafarisiavoshani et al. 1 19] to locate the attackers. 
Here, we discuss the case when there is a single attacker. We 



defer the case when there are multiple attackers to Section 
IVIII-BI 

In 1 19 1, the authors have shown that in a general network 
which has a single adversary, the location of the adversary can 
be narrowed down to a set of at most two nodes in both cases 
where the adversary inject corrupted packets to either one 
downstream edge or multiple downstream edges. This is done 
by partitioning the edges into two set: the set of polluted edges, 
£p, and non-polluted edges. Eg, then analyzing the nodes with 
respect to the identified £p and Eg. They also note that the 
partitioning of £p and Eg is not unique since the adversary 
might lie, which results in the uncertainty about the location 
of the attacker 

Fortunately, when the partition reflects the real state of 
pollution of the edges in the network, i.e., when the adversary 
is forced to report its true incoming spaces, the adversary is 
always the node that has no incoming edge belonging to £p 
but has at least one outgoing edge belonging to £p. Fig. |4] 
shows the case where B is an attacker whom get identified 
because it has no incoming polluted edge but one outgoing 
polluted edge. 

Using our scheme, the probability that the attacker lies about 
its incoming spaces is very small (Lemma |5]l. Furthermore, 
the probability that the attacker can prevent its children from 
reporting the subspaces polluted by itself is very small, too 
(Lemma |6|. As a result, with high probability (depending on 
q. A, S, and 6), our scheme can produce an unambiguous 
partitioning of £p and Eg, which helps to precisely locate the 
attacker. 

B. Full Description 

To distinguish cryptographic keys used in the detection 
scheme and keys used in the locating scheme, we decorate 
any key used in the locating scheme with an overhead bar, 
e.g., k. 

1) Assumptions. Similar to the assumptions we made in 
the detection scheme, we assume that there is a controller 
(could be the source itself) who knows the complete topology 
and the source space. This assumption is also made in recently 
proposed locating schemes | [T9| , | |20[ . We assume that each 
node N shares a triplet of secret keys [k]^, k%, kf^) with the 
controller (with the help of a PKI). We further assume that 
each node knows the identifiers of its adjacent nodes (can be 
bootstrapped by the controller). In addition, we assume that 
there is a reliable low-bandwidth end-to-end communication 
path between the controller and each node (the channel for 
the reports and the announcement by the controller). Other 
locating schemes, such as, |19| and [20 1, implicitly made this 
assumption. 

2) Bootstrapping. Let Af be the set of IDs of adjacent 
downstream nodes of P. For N e JV, the controller generates 
a set Xpi^ of A keys using a PRF Fi. /C x Z x [A] /C, 
where /C is the domain of key kp, and I is the domain of 
the identifiers of the nodes: Xp^ {Fi{kp, N,i), for i = 
1, • • • , A} . Note that P can compute Xp^ itself as it knows 
kp and its neighbors' identifiers. 

For N £ J\f, consider an array L whose elements are distinct 
subsets of size 6 of Xppf. Note that L has length (^). The 



controller uses another PRF F2: /C x I — > [(^)] to select from 
L a subset of size 6: = L[i], where i -s— F2{kl^,P). 

The controller then sends ypjy to node N through a secure 
and authenticated channel achieved with kj^ (for encryption) 
and kpf (for authentication) using an encrypt-then-authenticate 
algorithm. Note that similar to fc* and kx, the sets of keys 
XpN and can be used across multiple generations. 

Denote Xp^^ \ yp^ as ypN- 

3) Sending and Receiving. Let id be the identifier of the 
current source space 11'^. When a node P sends a packet y to 
its downstream node A^, beside the id, it has to send along A 
tags, which are computed using the Mac algorithm and keys 
in XpN. Let GpN{y) denote this set of tags. Node P sends 
{id, y, GpN{y))- When node N receives this packet from node 
P, it uses ypN and the Verify algorithm to check the validity 
of S out of A tags of GpN{y)- It drops y as long as there is an 
invalid tag. Otherwise, it stores the received tuple in its buffer 

4) Pollution Detection and Alert. A detection of the 
pollution is needed to start the locating process. Here, we use 
our detection scheme to provide the detection. Nevertheless, 
we stress that our locating scheme does not depend on any 
particular detection scheme. Using our detection scheme, a 
node N, upon detecting a pollution, sends an alert (id || N) 
to the controller through an authenticated channel achieved 
using a traditional MAC scheme, e.g., HMAC, and shared key 
fc^. When the controller receives an alert, it determines if 
id is reported before, if so, it ignores the alert. Otherwise, it 
sends a request (id) to each node N through an authenticated 
channel achieved with HMAC and kf^. This request demands 
each node to report its incoming subspaces. 

5) Reporting Subspaces. Upon receiving the request (id) 
from the controller, each node N checks if it receives a 
similar request for the same id before, if it does, it ignores the 
request. Otherwise, it prepares the report as follows: For each 
parent node P, let (yi,ti,i,-- - ,ti,x),--- AYh^iAr ■ ■ ,ti,\) 
be packets of source space id and their tags that node N 
received from node P. Node N sends to the controller through 
an authenticated channel achieved with HMAC and k^ the 
report (iV || P || y^ || ii || • • • || t^), where ^ F, (i e [I]); 

Yr = ELi "'^y^; = ELi o^'^ij 0' € [A]). 

6) Locating the Attackers. After sending out the requests, 
the controller waits for the reports. After At seconds, it starts 
identifying the pollution attackers. It classifies any node that 
does not report all of its incoming spaces as a malicious node. 
It only accepts reports with at least 9 valid SpaceMac tags, 
where the validation uses keys in ypN 's. It then identifies the 
polluted edges in the network based on the reported spaces and 
the source space. We note that checking if a reported space is 
polluted can be done quickly and efficiently in 0{mn) in terms 
of multiplication operations by leveraging the global coding 
coefficients of the reported packet and the source packets. 
Finally, any node that does not have a polluted incoming edge 
but has a polluted outgoing edge is classified as malicious. 

7) Releasing the Result. After identifying the set of 
attackers A, the controller sends [A) to each benign node N 
through an authenticated channel achieved with HMAC and 
key fc^. Upon receiving {A), each node N adds the nodes in A 



into its blacklist. Every node in the network will neither send 
nor receive traffic from nodes in its blacklist in subsequent 
communication. The controller itself removes nodes in A as 
well as incoming and outgoing edges of these nodes from 
the network topology. Note that the MAC keys used in the 
detection scheme do not need to be refreshed when a node is 
removed from the network. This is because the parent-child 
relationship between any pair of the remaining adjacent nodes 
is the same as before the removal. 

VIII. Security Analysis 

A. Single Adversary 

We described how our our detection and locating scheme 
work when there is a single adversary when we describe our 
schemes. We refer the reader to Section [Vl] and Section IVIII 
for the details. 

B. Multiple Adversaries 
1) Detection Scheme: 

a. Independent Adversaries: We consider adversaries as 
independent when every adversary checks the integrity of the 
packets it receives from its parents and drop corrupted packets. 
This scenario is similar to the single adversary scenario. In- 
network detection works because if a node N wants to pollute 
the network, it still has to forge a valid SpaceMac verification 
tag for a packet that lies outside of its received space Hat, 
which is computationally difficult. 

b. Colluding Adversaries: We first consider the passive 
colluding scenario, where there is an adversary who does not 
drop corrupted packets that it receives from its parents. In this 
scenario, in-network detection no longer works. To see this, 
consider the case where N receives from one of its parents, 
P, a corrupted packet {ty" 1 1 iy^ 1 1 y} with correct helper tag, 
ty", and invalid verification tag, ty^, and N does not drop 
this packet. To propagate the pollution to its child C, it simply 
computes an appropriate helper tag, ty^ , for this packet using 
kc, then forward the packet {4" 1 1 4" 1 1 ^ Clearly, ty" 
is a valid tag for y; hence, y passes C verification, and ty^ 
is a valid helper tag for y which will pass any verification by 
a child of C. 

We now consider the active colluding scenario, where a 
node N can collude with one of its parents, P, to learn about 
the private key, k^j, that is used for verification by its children. 
In this scenario, in-network detection also fails. This is because 
knowing the secret key, fcjy, N can generate a valid verification 
tag for any packet outside of its received space IIjv; thus, any 
child C of would not be able to detect corrupted packets 
sent by N . 

Note that in both cases where the in-network detection 
fails, the adversaries must be adjacent to each other. In 
both of these cases, the end-to-end detection made by the 
receivers comes to the rescue. This end-to-end detection is 
reliable because the receivers are trusted and the private key 
fc* shared by the source and the receivers is not known to 
any adversary. In order to pass the verification done by the 
receivers, an adversary has to forge a valid SpaceMac tag 



which is computationally difficult. We discuss how we could 
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relax the assumption of trasted receivers in Section VIII-E 

2) Locating Scheme: 
In the presence of multiple adversaries, an attacker may be "in 
the shadow" of some other attackers, which means that it may 
pollute only already polluted data and thus does not produce 
any detectable effect. More precisely, we define shadowed and 
exposed attackers below. 

Deflnition 3 (Adapted from fT9\ and ||20)). An attacker is 
shadowed ;/ it has at least one polluted incoming edge and is 
exposed otherwise. 

a. Independent Adversaries: In this case, we note that with 
high probability, our approach is already able to identify all 
exposed attackers. We utilize the following observation to 
identify all shadowed and exposed attackers. 

Lemma 7. For any directed acyclic graph with pollution 
attack in presence, there is at least one exposed attacker 

Proof: Consider a topological ordering of the graph, the 
first malicious node in the ordering is an exposed attacker. ■ 
Exploiting this, we can use multiple generations, i.e., trans- 
missions of (different) source spaces, to identify all attackers. 

Lemma 8. In a network with rj independent attackers. With 
high probability (depending on q, X, 6, and 9), all attackers 
can be identified after n generations which experience pollu- 
tion attack, where k < rj. 

Proof: Since there is at least one exposed attacker per 
generation by Lemma |7] our scheme can identify at least 
one attacker per generation. Because the identified attackers 
are immediately excluded from future communication and the 
other attackers are persistent, subsequent identified attackers 
are different from the already identified ones. Therefore, it 
takes at most rj generations to identify all attackers. ■ 
Note that we consider the cases where there exists an 
attacker who is disconnected from the receivers after the 
removal of all other attackers as degenerate cases. This is 
because the disconnected attacker is no longer able to pollute 
the network. In this case, the location of all attackers cannot 
be determined. 

b. Colluding Adversaries: We note that each pair of parent 
P and child N uses distinct key sets XpN and ypN\ thus, 
the collusion of malicious nodes does not provide knowledge 
about the key sets of benign nodes. However, when the 
distance between any two attackers equals to one, where 
distance refers to the length of the shortest path connecting 
two nodes, these attackers can collude to report a false space. 

Assume that in a network, there are colluding attackers 
P and N connected by a directed edge e{P, N) and there 
is no other pair of attackers in the network having distance 
one. We ask the question: "What can P and N achieve by 
manipulating edge e{P,N)T' Consider a topological ordering 
O of the nodes. If N makes e{P,N) e £p, the set of 
polluted edges identified by the controller, then P will be 
exposed and identified after all malicious nodes that come 
before P in are identified. After P is located, N and the 
rest of the attackers will be eventually located. Otherwise, if N 





A Is Identified In generation 1 



B is Identified In generation 2 E Is Identified In generation 3 



Fig. 5. An example where there are three attackers A, B, and E. Attackers 
A and B collude to make edge e{B, A), which is polluted, non-polluted. 
Nevertheless, all are identified after 3 generations. 



makes e{P, N) e £s, the set of non-polluted edges identified 
by the controller, then N will be exposed and located after 
all malicious nodes that come before iV in O are located. 
Analogous to the other case, after N is located, P (if not 
already located) and the rest of the attackers will be eventually 
located. Consequently, by manipulating the status of edge 
e{P, N), the attackers can, at best, change the order in which 
P and N are located. The above analysis can be extended 
to the general case where there are multiple pairs having 
distances one by considering the pair (P, N), where N has 
a polluted outgoing edge, that appears first in O first. Fig. l5] 
shows an example. As a result, we can generaUze lemma [8| 

Lemma 9. In a network with rj attackers. With high probability 
(depending on q. A, 5, and 6), all attackers can be identified 
after k generations which experience pollution attack, where 

K < 1]. 

C. Tag-Pollution Resistance 

As pointed out by Li et al. [1 IJ , a scheme that uses multiple 
MAC tags, such as pO| , fTT) , may suffer from tag pollution 
attacks. In these schemes, a packet carries multiple tags and 
each node only has keys to verify a subset of them; therefore, 
an adversary may tamper with some of the tags which only get 
verified far down the information flow. The consequence is that 
a packet with some corrupted tags may pass the verification 
of a few level of nodes. When mixed with other packets, one 
corrupted tag may snowball into a large number of corrupted 
tags. The packets carrying these corrupted tags eventually fail 
the authentication down the stream, thus wasting resources of 
the network. This effectively emulates a pollution attack. 

Both our detection scheme and locating scheme use mul- 
tiple MAC tags; fortunately, our schemes are resistant to tag 
pollution attack. More specifically, in our detection scheme, 
each packet carries three tags: one end-to-end tag, one helper 
tag, and one verification tag. An adversary cannot tamper with 
the end-to-end tag because the verification tag of a packet is 
computed over the concatenation of both the content of the 
packet and its end-to-end tag. In other words, if the adversary 
tampers with the end-to-end tag of a packet, the packet will not 
pass the verification test made by the immediate downstream 
nodes. This idea of using nested tags was originally introduced 
in |1 1) . Apparently, the adversary cannot tamper with the 
verification tag of a packet since this verification tag is checked 
immediately by one of its children. Finally, the adversary may 
tamper with the helper tag of a packet it sends to its child, 
e.g., attaches an erroneous helper tag. In this case, since the 
child uses the helper tag to compute verification tags for its 



outgoing packets, any packet involving the packet with an 
erroneous helper tag, sent by the child, will have a corrupted 
verification tag. The next hop that receives packets from 
this child will drop any packet with a corrupted verification 
tag. Note that no other tags, besides the end-to-end tags, 
travel more than two hops in our detection scheme. This 
eUminates the scenarios where tags are only verified far down 
the stream as in a tag pollution attack. Our locating scheme 
uses A tags; however, these tags are never forwarded to any 
next hop other than the controller (only during the locating 
process). Furthermore, Lemma |6] ensures that an adversary 
can only trick the controller by sending erroneous tags with 
negligible probability. As a result, our locating scheme is also 
not susceptible to tag pollution attacks. 

D. Denial of Service Attack 

In our locating scheme, once received an alert of pollution 
from one of the node, the controller triggers the locating 
process. This involves collecting report vectors from every 
node in the network. An adversary can exhaust the resource of 
the network by flooding the controller with alerts. Our locating 
scheme can combat this denial of service attack in a couple of 
ways. Recall that the locating process is only triggered once 
per generation since the controller only issues one request of 
report per unique generation id. This already limits the effect 
of the attack. In addition, the controller can maintain a counter, 
ctr, per node. Every time a node N report (id 1 1 N) and it 
turns out that there is no polluted edge, ctr is incremented. 
The controller then ignore any report from N for a period of 
time if ctr exceeds a certain threshold r. After this period of 
time, the counter resets. Note that a malicious node N cannot 
pretend to be another node N' when sending an alert, i.e., 
sending (id 1 1 A^') because the alert will be authenticated by the 
controller using the private key, fc^, shared by the controller 
and node N' . 

E. Malicious Receivers 

The assumption that the receivers are trustworthy safeguards 
our detection scheme against the scenarios where there are 
adjacent colluding adversaries. Here we discuss several avail- 
able options we could adopt when we relax this assumption 
and consider the case where some (but not all) receivers are 
malicious. 

One option is to use a separate key for each receiver When 
some of the receivers are malicious, it is necessary for the 
source to share with each receiver, i?;, a separate secret key, 
k*, instead of having all receivers and the source share a 
single key k* . This is because if only one key k* is used, 
a malicious receiver can leak the key to an intermediate node; 
as a result, this node can generate a valid end-to-end tag 
for any corrupted packet. For this option, for each fc*, the 
source generates a different end-to-end SpaceMac tag; thus, an 
honest receiver is still able to detect the pollution attack. This 
approach clearly increases the communication overhead of the 
end-to-end detection by \R.\ times as a packet now carries \R.\ 
end-to-end tags instead of one. This option works when the 
number of receivers in the network is smaU. 



Another option is to allow alternative ways of detection. 
Recall that our location scheme works independently of any 
detection scheme used. If an honest receiver, R, receives a 
corrupted packet, besides relying on the end-to-end MAC 
tag to detect corruption, it can also use other knowledge for 
detection. For instance, as soon as the packets R received 
form an inconsistent system of equations, R knows there is 
an attack. Also, R can rely on application-level information 
to determine corrupted packets. For instance, assume R is 
able to solve the system but it gets corrupted packets after 
solving the system, and assume that this is a video packet. The 
corrupted packet is very likely not compliant with the expected 
video codec. Using this application-level information, R can 
detect the pollution as well. As soon as R detects the pollution 
attack and alerts the controller, the locating process kicks in. 
Recall that for any generation which experiences a pollution 
attack, our locating scheme can eliminate at least one attacker 
A round of elimination by the locating scheme may break 
the adjacency property of the attackers, thus enabling the in- 
network detection to work in the next generation. 

IX. Performance Evaluation 

In this section, we evaluate the performance of our detection 
and locating schemes. We also compare the overhead of 
our schemes to recently proposed schemes. In addition, we 
implement SpaceMac as an open-source library and we make 
it available online. Finally, we simulate the scenario when 
there are multiple adversaries and show that our locating 
scheme can eliminate all of them within a few generations. 

A. Key Management Overhead 

We compare the number of MAC keys that each verifying 
node needs to maintain in our defense system to that required 
by state-of-the-art schemes. We start by comparing the over- 
head of our detection scheme to those of the other MAC -based 
detection schemes | [T0| , | [TT| . Similar to our detection scheme. 



the schemes in and [III also require each node to manage 
multiple keys. The number of keys could be large in both 
pO| and fTl l. In particular, in pO) , the number of keys each 
node maintains is, on average, no less than '=''p('^+i) 
where c is the colluding parameter Hence, the larger c is, 
and/or the larger the number of nodes the network has, the 
more keys are needed. In RIPPLE keys expire quickly 
periodically, and new keys are needed frequently; the number 
of keys increases linearly in the number of time intervals, as 
the transmission progresses. Clearly, the more time it takes 
to transmit a generation, and/or the more generations are 
transmitted, the larger the number of keys is needed. Standing 
in stark contrast to pO[ and pT[ , the number of keys a node 
needs to manage in our detection scheme neither depends on 
the transmission time nor c: it equals to the number of the 
parents and children the node has (plus one). The number of 
children and parents of a node may or may not depend on the 
network size, depending on the network topology. Finally, the 
number of keys needed for our locating scheme is equal to 
that of the scheme in |20|. However, we stress that our keys 
can be used for multiple generations while this is not the case 



in 1 20 1 due to replay attacks). 
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TABLE III 

The probability a malicious parent succeeds in preventing its child to report, 
Pr[P], the probability a malicious child succeeds in disparaging its parent, 
Pr[Af], and the space overhead con'espond to different parameter sets. 



B. Communication Overhead 

Communication overhead refers to the additional network 
bandwidth that our schemes introduce to the system. For both 
of the schemes, we neglect the bandwidth of the bootstrapping 
phase, where symmetric keys are distributed, as this can be 
done offline. 

Detection scheme. For the online overhead per packet, 
our detection scheme requires each packet to carry three 
SpaceMac tags: an end-to-end tag, a helper tag, and a ver- 
ification tag. Each tag is a symbol in the field F^; hence, the 
total overhead is 3 j^l =3 [logjg] bits. Our communication 
overhead is fixed, regardless of the network topology. 

Unlike our scheme, the online overhead of the scheme 
proposed by Li et al. pT| varies depending on the network 
level. In particular, in pTj , the authors define a level of a 
node N as the length of the longest path from S to N. The 
network level £ is defined as the maximum among the levels 
of the nodes. In their scheme, each packet carries £ MAC tags 
initially, then one or more tags are peeled off at every node 
the packet goes through. The average overhead is therefore 
approximately | [logjq] bits, which is linear in the network 
level. 

In 1 10 , to achieve security ^ and c-collusion resistance, 
i.e., secure against any c colluding attackers, each packet 
carries |X| MAC tags and each node verifies |B| tags, where 
(X.B) is a (c, c?)-cover free family. For instance, to provide 



security ^ and 2-collusion resistance, each packet needs to 
carry 49 tags and each node verifies 7 out of these 49 tags 
pl)J . The over head is 49[log2(7] in this case, or IBjflogj^/] 
in general. 

Compared to these two schemes, our detection scheme is 
able to provide in-network detection with significantly less 
communication overhead because of two main reasons: (i) we 
exploit local subspace property (Lemma[T]i and (ii) we delegate 
the handling of colluding attackers to our locating scheme. 
Table |ll] summarizes the overhead of our detection scheme in 
comparison to the other two schemes along with the supported 
features. 

Locating sclieme. The communication overhead of our 
locating scheme includes the A tags carried by each packet, 
the reporting vectors sent by the nodes, and the announcement 
(containing the identified adversaries) sent by the controller 
The online overhead, which depends on the number of packets 



sent in the network is the first one; the latter two overhead 
exist only when there is a pollution attack detected, thus 
are asymptotically negligible in the number of packets. The 
overhead per packet is A [logg^] bits. Table ' 



III 



shows that with 

an overhead of about 20 bytes per packet, the probability that 
a malicious parent succeeds in preventing its child to report, 
Pr[P], and the probability that a malicious child succeeds in 
disparaging its parent, Pr[Af], are both very small. 

Compared to the scheme by Wang et al. pO) , we have the 
same amount of online overhead per packet (A tags). However, 
in 1 20 1, when a pollution attack is detected, the controller has 
to compute multiple checksums for the polluted generation 
and send these checksums to all the nodes. Each checksum 
includes m symbols in Fg (recall that rn is the number of 
packets per generation). If ^ checksums are computed (/i > 1 
to improve the security guarantee), the overhead resulted in 
sending the checksums to all nodes is | V| /i m [log2(7] bits. 
In contrast, our locating scheme does not need this checksum 
dissemination. 

Combined Scheme. The total online communication 
overhead of our defense scheme is (3 + A) [log2(7] bits. Note 
that this overhead neither depends on the packet size nor the 
generation size; hence, it becomes more inexpensive when the 
packet size is large. For instance, for A = 19, n = 1024, m = 
32, q — 2^, the per-packet communication overhead is only 2% 
while the security of the detection is 2^^ and of the security 
of the locating is 2^^". Most importantly, standing in stark 
contrast to the other two schemes: 1 10 1 and 1 1 1 1, our overhead 
is constant in terms of the network size and the number of 
attackers. 



C. Computation Overhead 

The major computational overhead of both of our schemes 
are from the algorithms Mac, Combine, and Verify performed 
at each node for every packet. The computation cost for 
the bootstrapping, reporting, and locating steps is a one-time 
cost for a generation and thus is asymptotically negligible 
in the number of packets. We subsequently focus on the 
online computation cost incurred by the three algorithms of 
SpaceMac. 

Both the Mac and the Verify algorithms incur one PRG 
call, ni FRF calls, and {n + 2m) finite field multiplications. 
Note that the results of both the PRG and FRF calls can be 
cached and used for the whole generation. Thus, they can be 
considered as a one-time cost as well. If we let w be the 
average number of packets combined by each node, then the 
Combine algorithm incurs w multiplications on average. 

Detection Scheme. The operations performed by each 
node in the detection scheme for each packet y include (i) 
verifying the integrity of y using the Verify algorithm, (ii) 



combining the received helper tags to generate a verification 
tag for an outgoing packet z using the Combine algorithm, 
(iii) computing a helper tag for z using the Mac algorithm. 
If the node is the source, it needs to compute the end-to-end 
tag using the Mac algorithm; however, in this case it does 
not need to verify the integrity of packets. If it is a receiver, 
it needs to verify the end-to-end tag by performing another 
Verify algorithm. The worst case computational overhead, i.e., 
when the node is a receiver, in terms of the number of finite 
field multiplications is 3 (n + 2m) + w. 

Locating Scheme. For each packet received, each node 
verifies S tags using the Verify algorithm. For each packet it 
sends out, each node needs to compute A tags using the Mac 
algorithm. The total overhead is therefore {5 + X) {n + 2m) 
number of multiplications. 

Combined Scheme. The overall computational overhead 
per packet per node of our combined scheme is {3+S+\){n+ 
2m) + w . 

Comparison. The computational overhead per node per 
packet of the scheme proposed by Li et al. fTTl includes one 
Combine and one Verify operation. Based on the closed-form 
formulas provided in 01 1|, this overhead is w(^^) + {n + 
m+^=^) number of multiplications. Finally, the computational 
overhead per node per packet of the scheme proposed by 
Agrawal and Boneh 1 10 1 includes one Combine and |B| Verify, 
which is w |X| + |B| {n + 2m) number of multiplications. 

For a concrete comparison, let g = 2^ , n = 1024, m = 
32,?/; = 4,^ = 9 (for an average network of size 100, note 
that logjlOO ~ 6). Table IV shows the set of appropriate 



parameters and their corresponding computation overhead to 
achieve the security 2^^ for all schemes. Following po) , we 
implement multiplication in by creating an offline multi- 
plication table, storing all 2^^ products of pairs of elements in 
this field. The table only occupies about 64 KB in CIC++ and 
128 KB in Java (since there is no 8-bit primitive data type 
in Java that has values from to 2^-1). This table enables 
us to achieve fast multiplication, which is now just a table 
lookup. Note that this approach is not possible (space-wise) 
when working with large field, for instance, any scheme that 
relies on public key cryptography, such as, p2)-p4), requires 
a large field size, i.e., [logj^/] > 128. The details about the 



platforms we use are provided in Section IX-D The numbers 
reported in Table IV are averaged over 10^ multiplications. 

the computational latency of our 



As shown in Table IV 



detection scheme is in the same order of magnitude as the 
other two detection schemes. Table |IV] also shows that the 
latency of our combined detection-locating scheme is about 
10 times higher than that of our detection scheme, or 30 times 
higher than that of RIPPLE. 

This is the trade-off when one wants to locate and eliminate 
all attackers. If one chooses not to locate and eliminate the 
attackers, they may keep flooding their child nodes with 
corrupted packets along with their MAC tags. This not only 
wastes the child nodes' download bandwidth, but also exhausts 
their computational resources since they need to constantly 
run the verification algorithm on the corrupted packets. Fur- 
thermore, the attackers may only send out corrupted packets 
but not valid packets. This means that all packets they receive 
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from their parent nodes are not used at all, which implies 
that the upload bandwidth of the parent nodes is also wasted. 
Apparently, the more parents and children the attackers have, 
and/or the larger the number of attackers the network has, the 
more resources are wasted due to the attack. 

Nevertheless, we note that even though the combined 
detection-locating scheme has one order of magnitude larger 
computational delay than other stand-alone detection schemes, 
its delay is still very small, in the order of sub-millisecond on 
a PC or millisecond on a resource-constrained Android phone 
(Samsung Captivate). Therefore, when operating on PCs or 
smart phones, we strongly recommend using our full scheme. 
In scenarios where the network cannot afford the computation 
overhead of the locating scheme, each node may want to keep 
a threshold (per parent) of how many corrupted packets it 
detects from this parent so far, and refuse to receive packets 
from the parent after the number of corrupted packets crosses 
this threshold. This reduces the waste of the node's download 
bandwidth and CPU time. 

D. Library 

We implement all three algorithms of SpaceMac in both 
CIC++ and Java and provide them as a library. As mentioned 
before, we implement field multiplication using a look-up 
table. We implement addition as a simple XOR operation. 
Finally, we implement PRF and PRG using AES with CBC 
mode of operation. The AES implementation is provided by 
the standard crypto library |29| for Java implementation and 
crypto++ open-source library |30| for CIC++ implementation. 
We make our SpaceMac library available online along with the 
source code fT\\. 

This library is useful for those who want to adopt our 
SpaceMac scheme into their system, or those who want to 
deploy our proposed defense scheme. The CIC++ implemen- 
tation is faster than the Java implementation; it is meant to 
be used by low-level or embedded devices, such as, network 
routers. The Java implementation, meanwhile, is useful for 
high-level application-layer programs, such as, peer-to-peer 
applications. Furthermore, the Java implementation is ready 
to be run on the current Android OS (Android 2.2 Froyo). 
This provides support for the rising implementation of network 
coding on smart phones, such as, the work in pT] , p2) , and 

Table [V] provides the benchmark of all three algorithms 
of SpaceMac. For the benchmark, we set n = 1024, m — 
32, w = 4. Except for the Android benchmark, both the CIC++ 
and Java implementations were run on a PC with a quad-core 
2.8 Ghz processor and 32 GB of RAM. Our Android device. 
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Exact locating 


Network Params. 




q = 2^,n = 1024, m 


= 32,w = 4:,i = 9 




Scheme Params. 




|X| = 49, |B| = 7,c = 2 




A = 19, <5 = 9, e = 3 


Security 




2- 






# Multiplications 


1.096 


7,812 


3,268 


33,732 


C/C++ ifis) 


5.5 


39.1 


16.3 


168.7 


Java (ps) 


6.6 


46.9 


19.6 


202.4 


Android (/^s) 


116.2 


828.1 


346.4 


3,575.6 



TABLE IV 

Online computation overhead per packet per node in terms of the number of finite field multiplications, computing latency in C/C++, Java, and on an 

Android platform (Samsung Captivate) 



Number of attackers 


4 


8 


12 


16 


20 


Average # of generations 


1.92 


3.01 


4.16 


4.77 


5.64 


Average delay (ins) 


412 


647 


896 


1,031 


1,217 



TABLE VI 

The average number of generations and delay required to detect all attackers 
in a network with 50 intermediate nodes. 



the Samsung Captivate, has a single 1 Ghz processor and 512 
MB RAM. The reported values correspond to the averages 
taken over 10^ runs of each algorithm. The most expensive 
operations of Mac and Verify algorithms are the PRF and PRG 
calls; however, we stress that these calls can be done offline. 
For completeness, the reported values include the cost of these 
calls. 

From Table |V] we can see that in order to achieve high 
security (2"^^), the computational latency of our C/C++ 
implementation is only in the order of hundreds of microsec- 
onds. Moreover, even on the Android resource-constrained 
device, the computational latency of Mac and Verify are still 
very small, only in the order of millisecond. Note that the 
Combine operation is several orders of magnitude faster than 
the Mac and Verify algorithms since it only involves finite field 
multiplications, which are quick table lookups. These results 
demonstrate that SpaceMac algorithms are fast and appropriate 
for practical use. 

E. Simulation 

We implement a simulation in Python that simulates a 
scenario where there are multiple colluding attackers in a 
network. We generate between a pair of source and receiver 
nodes a random directed acyclic graph network of 50 nodes 
using the py graph library 1 34 1 . The ratio of edges to nodes is 
a random number in [1,5]. All edges have a random end-to- 
end delay between 10 and 100 ms. In a single generation, 32 
packets in F2S^'' {q = 2^,n = 1024, m = 32) are generated 
and sent by the source node. The locating process is triggered 
multiple times, each time by an alert by the receiver 

The attackers in the network are chosen randomly from 
the population of 50 nodes in the network in a way that 
each attacker can still pollute the network even when the 
rest attackers are removed. The attackers pollute all of their 
outgoing edges. When requested by the controller, most of 
them honestly report their incoming subspaces; however, some 
of them, who have malicious parents, lie about their received 



subspaces from those parents. This emulates the case where 
the attackers collude to manipulate the reports of their incom- 
ing subspaces. 

We evaluate the average number of generations to locate all 
T] attackers in the network, where rj varies from 4 to 20. For 
each 7], we perform the simulation for 100 rounds (varying 
the network topology, attacker location, and edge delays) to 
get the average value. We also evaluate the average delay 
it takes to identify all attackers, where the delay refers to 
the time between when the source starts sending and when 
all the attackers are located. The results shown in table |VI| 
indicate that we succeed in locating all attackers very quickly 
(about second for 20 attackers) and after much smaller than j] 
generations (5.5 generations for 20 attackers). 

X. Conclusion 

In this work, we introduce a novel homomorphic MAC 
scheme for expanding space called SpaceMac. We propose 
a cooperative defense system against pollution attacks built 
on SpaceMac. To the best of our knowledge, our system 
is the first that can provide both in-network detection and 
exact locating of the attackers. In addition, our system is 
collusion resistant and tag-pollution resistant. Our evaluation 
results using real implementation in C/C++ and Java on 
multiple devices demonstrate that our defense scheme incurs 
both low communication and low computation overhead. We 
implemented SpaceMac as a ready-to-use library and make it 
available online. 
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Appendix 
Proof of LemmaO 

Recall that the server checks X — S tags, and in order for the 
server to accept a report, there must be at least 6 valid tags. 
The probabiUty that the child successfully forges a SpaceMac 
tag is i; and so, the probability that the child fails to forge 
such a tag is 1— i. Let i be the number of valid tags. The stated 
probability is a direct result of enumerating the probability of 
success of the child in all the cases. 

Proof of Lemma[6] 

Let Yr denote the random packet of the parent's space 
that the child chooses to report to the controller: = 
Sjeu '^J where aj ^ and I? is a subset of indices of 
the packets sent from the parent to the child. 

Recall that the child is benign and always uses Combine 
to generate tag for y^. Let x denote the number of correctly 
computed tags of y^, i.e., the parent uses Mac to compute the 
corresponding x tags for every yj for j £ T). The value of x 
must be smaller than S + 6 otherwise the controller will accept 
Yr as there are at least 6 valid tags. Let i out of these x tags 
be the number of tags verifiable by the child, i < min{S,x). 
Since y,. has S — i not correctly computed and verifiable by 
the child, there are at least 6 — i not correctly computed tags 
which are verifiable by the child among the tags of yj's. Thus, 
the probability that the child accepts all y^'s is at most 



The remaining x—i tags of y^ are checked by the controller 
These tags of y^ are valid tags; hence, x — i must be smaller 
than 9 otherwise the controller will accept the report. Thus, 
X — i < 0; hence, i > max{x — 9 + 1,0). The probability that 
the controller rejects y^ equals to the probability that there 
are less than 9 valid tags. Since there are already x — i vahd 
tags, this probability equals 



E 

J=0 



(x-i) 
j 



1\ 1 



(1--) 



X-S-{x-i)-j 



where j denotes the number of valid tags out of the rest A — 
5 — {x — i) tags. Note that p^ej < 1- 

Putting the above values together, the probability that the 
child accepts all y^'s which form y^ and the server rejects yr 
when there are x correctly computed tags is upper bounded 
as follows: 



p{x) < 



< 



1 
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i—max{x — 6-\-1.0) 
mm{5,x) 
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i=max(a;-6l+l,0) 
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The best probability is the maximum of p{xys. 



